AITopics | regularization constant

Causal Regularization

Neural Information Processing SystemsDec-25-2025, 03:11:33 GMT

We argue that regularizing terms in standard regression methods not only help against overfitting finite data, but sometimes also help in getting better causal models. We first consider a multi-dimensional variable linearly influencing a target variable with some multi-dimensional unobserved common cause, where the confounding effect can be decreased by keeping the penalizing term in Ridge and Lasso regression even in the population limit. The reason is a close analogy between overfitting and confounding observed for our toy model. In the case of overfitting, we can choose regularization constants via cross validation, but here we choose the regularization constant by first estimating the strength of confounding, which yielded reasonable results for simulated and real data. Further, we show a'causal generalization bound' which states (subject to our particular model of confounding) that the error made by interpreting any non-linear regression as causal model can be bounded from above whenever functions are taken from a not too rich class.

causal regularization, name change, regularization constant, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Reviews: Fast Sparse Group Lasso

Neural Information Processing SystemsJan-27-2025, 06:27:02 GMT

Summary: This paper presents a fast block coordinate descent algorithm for the sparse-group lasso problem. Two strategies are proposed to improve the computational efficiency. The first strategy is quickly identifying the groups of inactive features. The idea is to use an easy-to-compute upper bound when checking if the inactive-group condition holds. The second strategy is to select a set of candidate groups and update the feature vectors inside those groups first before iterating over all groups.

algorithm, candidate group, original bcd algorithm, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.74)

Add feedback

Causal Regularization

Neural Information Processing SystemsOct-9-2024, 16:28:05 GMT

We argue that regularizing terms in standard regression methods not only help against overfitting finite data, but sometimes also help in getting better causal models. We first consider a multi-dimensional variable linearly influencing a target variable with some multi-dimensional unobserved common cause, where the confounding effect can be decreased by keeping the penalizing term in Ridge and Lasso regression even in the population limit. The reason is a close analogy between overfitting and confounding observed for our toy model. In the case of overfitting, we can choose regularization constants via cross validation, but here we choose the regularization constant by first estimating the strength of confounding, which yielded reasonable results for simulated and real data. Further, we show a'causal generalization bound' which states (subject to our particular model of confounding) that the error made by interpreting any non-linear regression as causal model can be bounded from above whenever functions are taken from a not too rich class.

causal model, causal regularization, regularization constant

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Convex Tensor Decomposition via Structured Schatten Norm Regularization

Neural Information Processing SystemsMar-13-2024, 14:42:32 GMT

We study a new class of structured Schatten norms for tensors that includes two recently proposed norms ("overlapped" and "latent") for convex-optimizationbased tensor decomposition. We analyze the performance of "latent" approach for tensor decomposition, which was empirically found to perform better than the "overlapped" approach in some settings. We show theoretically that this is indeed the case. In particular, when the unknown true tensor is low-rank in a specific unknown mode, this approach performs as well as knowing the mode with the smallest rank. Along the way, we show a novel duality result for structured Schatten norms, which is also interesting in the general context of structured sparsity. We confirm through numerical simulations that our theory can precisely predict the scaling behaviour of the mean squared error.

decomposition, latent approach, tensor, (16 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Optimal Training of Mean Variance Estimation Neural Networks

Sluijterman, Laurens, Cator, Eric, Heskes, Tom

arXiv.org Artificial IntelligenceAug-3-2023

This paper focusses on the optimal implementation of a Mean Variance Estimation network (MVE network) (Nix and Weigend, 1994). This type of network is often used as a building block for uncertainty estimation methods in a regression setting, for instance Concrete dropout (Gal et al., 2017) and Deep Ensembles (Lakshminarayanan et al., 2017). Specifically, an MVE network assumes that the data is produced from a normal distribution with a mean function and variance function. The MVE network outputs a mean and variance estimate and optimizes the network parameters by minimizing the negative loglikelihood. In our paper, we present two significant insights. Firstly, the convergence difficulties reported in recent work can be relatively easily prevented by following the simple yet often overlooked recommendation from the original authors that a warm-up period should be used. During this period, only the mean is optimized with a fixed variance. We demonstrate the effectiveness of this step through experimentation, highlighting that it should be standard practice. As a sidenote, we examine whether, after the warm-up, it is beneficial to fix the mean while optimizing the variance or to optimize both simultaneously. Here, we do not observe a substantial difference. Secondly, we introduce a novel improvement of the MVE network: separate regularization of the mean and the variance estimate. We demonstrate, both on toy examples and on a number of benchmark UCI regression data sets, that following the original recommendations and the novel separate regularization can lead to significant improvements.

artificial intelligence, machine learning, variance, (17 more...)

arXiv.org Artificial Intelligence

2302.08875

Country:

North America > United States > Massachusetts > Middlesex County > Reading (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Causal Regularization

Janzing, Dominik

Neural Information Processing SystemsMar-19-2020, 01:48:16 GMT

We argue that regularizing terms in standard regression methods not only help against overfitting finite data, but sometimes also help in getting better causal models. We first consider a multi-dimensional variable linearly influencing a target variable with some multi-dimensional unobserved common cause, where the confounding effect can be decreased by keeping the penalizing term in Ridge and Lasso regression even in the population limit. The reason is a close analogy between overfitting and confounding observed for our toy model. In the case of overfitting, we can choose regularization constants via cross validation, but here we choose the regularization constant by first estimating the strength of confounding, which yielded reasonable results for simulated and real data. Further, we show a'causal generalization bound' which states (subject to our particular model of confounding) that the error made by interpreting any non-linear regression as causal model can be bounded from above whenever functions are taken from a not too rich class.

causal model, causal regularization, regularization constant

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Qini-based Uplift Regression

Belbahri, Mouloud, Murua, Alejandro, Gandouet, Olivier, Nia, Vahid Partovi

arXiv.org Machine LearningNov-27-2019

This article proposes methodology that identifies characteristics associated with a home insurance policy that can be used to infer the link between marketing intervention and policy renewal rate. Using the resulting statistical model, the goal is to predict which customers the company should focus on, in order to deploy future retention campaigns. A subscription-based company loses its customers when they stop doing business with their service. Also known as customer attrition, customer churn can be a drag on the business growth. It is less expensive to retain existing customers than to acquire new customers, so businesses put effort into marketing strategies to reduce customer attrition. Customer loyalty, on the other hand, is usually more profitable because the company have already earned the trust and loyalty of existing customers. Businesses mostly have a defined strategy for fighting customer churn over a period of time. Organizations are able to determine their success rate in customer loyalty and identify improvement strategies using available data and learning about churn.

coefficient, uplift, uplift model, (16 more...)

arXiv.org Machine Learning

1911.12474

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > Montserrat (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Banking & Finance > Insurance (1.00)
Marketing (0.87)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.95)

Add feedback

Information asymmetry in KL-regularized RL

Galashov, Alexandre, Jayakumar, Siddhant M., Hasenclever, Leonard, Tirumala, Dhruva, Schwarz, Jonathan, Desjardins, Guillaume, Czarnecki, Wojciech M., Teh, Yee Whye, Pascanu, Razvan, Heess, Nicolas

arXiv.org Machine LearningMay-3-2019

Many real world tasks exhibit rich structure that is repeated across different parts of the state space or in time. In this work we study the possibility of leveraging such repeated structure to speed up and regularize learning. We start from the KL regularized expected reward objective which introduces an additional component, a default policy. Instead of relying on a fixed default policy, we learn it from data. But crucially, we restrict the amount of information the default policy receives, forcing it to learn reusable behaviours that help the policy learn faster. We formalize this strategy and discuss connections to information bottleneck approaches and to the variational EM algorithm. We present empirical results in both discrete and continuous action domains and demonstrate that, for certain tasks, learning a default policy alongside the policy can significantly speed up and improve learning.

default policy, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

1905.0124

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
(2 more...)

Add feedback

Deep Demosaicing for Edge Implementation

Ramakrishnan, Ramchalam Kinattinkara, Jui, Shangling, Nia, Vahid Patrovi

arXiv.org Machine LearningApr-12-2019

Most digital cameras use sensors coated with a Color Filter Array (CFA) to capture channel components at every pixel location, resulting in a mosaic image that does not contain pixel values in all channels. Current research on reconstructing these missing channels, also known as demosaicing, introduces many artifacts, such as zipper effect and false color. Many deep learning demosaicing techniques outperform other classical techniques in reducing the impact of artifacts. However, most of these models tend to be over-parametrized. Consequently, edge implementation of the state-of-the-art deep learning-based demosaicing algorithms on low-end edge devices is a major challenge. We provide an exhaustive search of deep neural network architectures and obtain a pareto front of Color Peak Signal to Noise Ratio (CPSNR) as the performance criterion versus the number of parameters as the model complexity that beats the state-of-the-art. Architectures on the pareto front can then be used to choose the best architecture for a variety of resource constraints. Simple architecture search methods such as exhaustive search and grid search requires some conditions of the loss function to converge to the optimum. We clarify these conditions in a brief theoretical study.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

1904.00775

Genre: Research Report (1.00)

Industry: Semiconductors & Electronics (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Convex Tensor Decomposition via Structured Schatten Norm Regularization

Tomioka, Ryota, Suzuki, Taiji

Neural Information Processing SystemsDec-31-2013

We propose a new class of structured Schatten norms for tensors that includes two recently proposed norms (overlapped'' and "latent'') for convex-optimization-based tensor decomposition. Based on the properties of the structured Schatten norms, we mathematically analyze the performance of "latent'' approach for tensor decomposition, which was empirically found to perform better than the "overlapped'' approach in some settings. We show theoretically that this is indeed the case. In particular, when the unknown true tensor is low-rank in a specific mode, this approach performs as well as knowing the mode with the smallest rank. Along the way, we show a novel duality result for structures Schatten norms, which is also interesting in the general context of structured sparsity. We confirm through numerical simulations that our theory can precisely predict the scaling behaviour of the mean squared error. "

artificial intelligence, latent approach, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > Japan (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback